data mining technique
Leveraging Data Mining Algorithms to Recommend Source Code Changes
Naghshzan, AmirHossein, Khalilazar, Saeed, Poilane, Pierre, Baysal, Olga, Guerrouj, Latifa, Khomh, Foutse
Context: Recent research has used data mining to develop techniques that can guide developers through source code changes. To the best of our knowledge, very few studies have investigated data mining techniques and--or compared their results with other algorithms or a baseline. Objectives: This paper proposes an automatic method for recommending source code changes using four data mining algorithms. We not only use these algorithms to recommend source code changes, but we also conduct an empirical evaluation. Methods: Our investigation includes seven open-source projects from which we extracted source change history at the file level. We used four widely data mining algorithms \ie{} Apriori, FP-Growth, Eclat, and Relim to compare the algorithms in terms of performance (Precision, Recall and F-measure) and execution time. Results: Our findings provide empirical evidence that while some Frequent Pattern Mining algorithms, such as Apriori may outperform other algorithms in some cases, the results are not consistent throughout all the software projects, which is more likely due to the nature and characteristics of the studied projects, in particular their change history. Conclusion: Apriori seems appropriate for large-scale projects, whereas Eclat appears to be suitable for small-scale projects. Moreover, FP-Growth seems an efficient approach in terms of execution time.
- North America > Canada > Ontario > National Capital Region > Ottawa (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Materials > Metals & Mining (0.50)
- Health & Medicine (0.46)
An Improved Heart Disease Prediction Using Stacked Ensemble Method
Islam, Md. Maidul, Tania, Tanzina Nasrin, Akter, Sharmin, Shakib, Kazi Hassan
Heart disorder has just overtaken cancer as the world's biggest cause of mortality. Several cardiac failures, heart disease mortality, and diagnostic costs can all be reduced with early identification and treatment. Medical data is collected in large quantities by the healthcare industry, but it is not well mined. The discovery of previously unknown patterns and connections in this information can help with an improved decision when it comes to forecasting heart disorder risk. In the proposed study, we constructed an ML-based diagnostic system for heart illness forecasting, using a heart disorder dataset. We used data preprocessing techniques like outlier detection and removal, checking and removing missing entries, feature normalization, cross-validation, nine classification algorithms like RF, MLP, KNN, ETC, XGB, SVC, ADB, DT, and GBM, and eight classifier measuring performance metrics like ramification accuracy, precision, F1 score, specificity, ROC, sensitivity, log-loss, and Matthews' correlation coefficient, as well as eight classification performance evaluations. Our method can easily differentiate between people who have cardiac disease and those are normal. Receiver optimistic curves and also the region under the curves were determined by every classifier. Most of the classifiers, pretreatment strategies, validation methods, and performance assessment metrics for classification models have been discussed in this study. The performance of the proposed scheme has been confirmed, utilizing all of its capabilities. In this work, the impact of clinical decision support systems was evaluated using a stacked ensemble approach that included these nine algorithms
- Europe > Hungary (0.05)
- North America > United States (0.04)
- Europe > Switzerland (0.04)
- (2 more...)
A Predictive Model using Machine Learning Algorithm in Identifying Students Probability on Passing Semestral Course
This study aims to determine a predictive model to learn students probability to pass their courses taken at the earliest stage of the semester. To successfully discover a good predictive model with high acceptability, accurate, and precision rate which delivers a useful outcome for decision making in education systems, in improving the processes of conveying knowledge and uplifting students academic performance, the proponent applies and strictly followed the CRISP-DM (Cross-Industry Standard Process for Data Mining) methodology. This study employs classification for data mining techniques, and decision tree for algorithm. With the utilization of the newly discovered predictive model, the prediction of students probabilities to pass the current courses they take gives 0.7619 accuracy, 0.8333 precision, 0.8823 recall, and 0.8571 f1 score, which shows that the model used in the prediction is reliable, accurate, and recommendable. Considering the indicators and the results, it can be noted that the prediction model used in this study is highly acceptable. The data mining techniques provides effective and efficient innovative tools in analyzing and predicting student performances. The model used in this study will greatly affect the way educators understand and identify the weakness of their students in the class, the way they improved the effectiveness of their learning processes gearing to their students, bring down academic failure rates, and help institution administrators modify their learning system outcomes. Further study for the inclusion of some students demographic information, vast amount of data within the dataset, automated and manual process of predictive criteria indicators where the students can regulate to which criteria, they must improve more for them to pass their courses taken at the end of the semester as early as midterm period are highly needed.
- Asia > Philippines > Luzon > Calabarzon > Province of Cavite (0.14)
- Asia > Philippines > Luzon > National Capital Region > City of Manila (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (3 more...)
- Materials > Metals & Mining (1.00)
- Education > Educational Setting (1.00)
- Education > Assessment & Standards > Student Performance (1.00)
Make Data Work for You with These Top Data Mining Tools and Techniques
With everything going computerized and digital, the amount of data generated by us is humongous. Organizations collectively spend billions of dollars to just store and analyze this data. They make efforts to drive valuable business insights from this data using data mining. Data Mining is the process of discovering hidden patterns in a pile of big data. Business executives use these emerging patterns to make informed business strategy decisions.
Petroleum prices prediction using data mining techniques -- A Review
Weldon, Kiplang'at, Ngechu, John, Everlyne, Ngatho, Njambi, Nancy, Gikunda, Kinyua
Over the past 20 years, Kenya's demand for petroleum products has proliferated. This is mainly because this particular commodity is used in many sectors of the country's economy. Exchange rates are impacted by constantly shifting prices, which also impact Kenya's industrial output of commodities. The cost of other items produced and even the expansion of the economy is significantly impacted by any change in the price of petroleum products. Therefore, accurate petroleum price forecasting is critical for devising policies that are suitable to curb fuel-related shocks. Data mining techniques are the tools used to find valuable patterns in data. Data mining techniques used in petroleum price prediction, including artificial neural networks (ANNs), support vector machines (SVMs), and intelligent optimization techniques like the genetic algorithm (GA), have grown increasingly popular. This study provides a comprehensive review of the existing data mining techniques for making predictions on petroleum prices. The data mining techniques are classified into regression models, deep neural network models, fuzzy sets and logic, and hybrid models. A detailed discussion of how these models are developed and the accuracy of the models is provided.
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.66)
Common human diseases prediction using machine learning based on survey data
Nahian, Jabir Al, Masum, Abu Kaisar Mohammad, Abujar, Sheikh, Mia, Md. Jueal
In this era, the moment has arrived to move away from disease as the primary emphasis of medical treatment. Although impressive, the multiple techniques that have been developed to detect the diseases. In this time, there are some types of diseases COVID-19, normal flue, migraine, lung disease, heart disease, kidney disease, diabetics, stomach disease, gastric, bone disease, autism are the very common diseases. In this analysis, we analyze disease symptoms and have done disease predictions based on their symptoms. We studied a range of symptoms and took a survey from people in order to complete the task. Several classification algorithms have been employed to train the model. Furthermore, performance evaluation matrices are used to measure the model's performance. Finally, we discovered that the part classifier surpasses the others.
- Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.05)
- North America > United States > Florida > Brevard County > Melbourne (0.04)
- Asia > South Korea (0.04)
- Asia > Singapore (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Privacy Enhancing Technologies and why they're vital for healthcare innovation
The COVID-19 pandemic has supercharged the scope of the issues the global healthcare industry was already grappling with. When the pandemic arrived, healthcare organisations often struggled to find the basic information they needed to respond -- whether it was disease and death rates or the availability of hospital beds and critical supplies. Among other problems, the pandemic highlighted the desperate need for collaborative data analytics in healthcare. As McKinsey observed, healthcare's digital barriers are often decidedly non-technological. The technology is out there (or rapidly evolving) -- in October 2020, Pfizer and IBM researchers announced that they have developed a machine learning technique that can predict Alzheimer's disease years before symptoms develop.
- North America > United States (0.14)
- Oceania > Australia (0.05)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.54)
Privacy Enhancing Technologies and why they matter
The COVID-19 pandemic has supercharged the scope of the issues the global healthcare industry was already grappling with. When the pandemic arrived, healthcare organisations often struggled to find the basic information they needed to respond -- whether it was disease and death rates or the availability of hospital beds and critical supplies. Among other problems, the pandemic highlighted the desperate need for collaborative data analytics in healthcare. As McKinsey observed, healthcare's digital barriers are often decidedly non-technological. The technology is out there (or rapidly evolving) -- in October 2020, Pfizer and IBM researchers announced that they have developed a machine learning technique that can predict Alzheimer's disease years before symptoms develop.
- North America > United States (0.14)
- Oceania > Australia (0.05)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.54)
Principal Data Scientist
Crossix is a health-focused technology company dedicated to advancing healthcare marketing with analytics and innovative planning, targeting, measurement, and optimization solutions. Positioned at the center of big data, innovative technology, and multichannel media, Crossix, a Veeva Company, provides our clients with insights to help make strategic business decisions and drive improved patient outcomes. Crossix knows that our employees are integral to our success, which is why we have created an inclusive culture where everyone can thrive. Along with competitive salaries and benefits, we invest in opportunities for career growth, and provide other perks, such as team outings, fitness allowances and professional development. Crossix is headquartered in New York with growing offices in Minsk, Belarus and Kiryat Ono, Israel.
- North America > United States > New York (0.25)
- Europe > Belarus > Minsk Region > Minsk (0.25)
- Asia > Middle East > Israel (0.25)
- Health & Medicine (1.00)
- Education > Educational Setting (0.37)
Feature selection for medical diagnosis: Evaluation for using a hybrid Stacked-Genetic approach in the diagnosis of heart disease
Abdollahi, Jafar, Nouri-Moghaddam, Babak
Background and purpose: Heart disease has been one of the most important causes of death in the last 10 years, so the use of classification methods to diagnose and predict heart disease is very important. If this disease is predicted before menstruation, it is possible to prevent high mortality of the disease and provide more accurate and efficient treatment methods. Materials and Methods: Due to the selection of input features, the use of basic algorithms can be very time-consuming. Reducing dimensions or choosing a good subset of features, without risking accuracy, has great importance for basic algorithms for successful use in the region. In this paper, we propose an ensemble-genetic learning method using wrapper feature reduction to select features in disease classification. Findings: The development of a medical diagnosis system based on ensemble learning to predict heart disease provides a more accurate diagnosis than the traditional method and reduces the cost of treatment. Conclusion: The results showed that Thallium Scan and vascular occlusion were the most important features in the diagnosis of heart disease and can distinguish between sick and healthy people with 97.57% accuracy.
- Asia > Middle East > Iran (0.29)
- North America > United States (0.28)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.47)